Implement IAMF decoder #3709

osagie98 · 2024-06-27T21:58:11Z

Implements a libiamf-based audio decoder.

Decoder specific code is guarded behind a GN flag enable_iamf_decode, which controls the ENABLE_IAMF_DECODE macro. enable_iamf_decode is off by default, so the new code doesn't build.

The decoder is implemented for Linux and Android TV.

b/341792042

Also clean up logs and reformat decoder.

starboard/linux/x64x11/shared/platform_configuration/BUILD.gn

starboard/shared/libiamf/iamf_config_reader.cc

jasonzhangxx · 2024-08-06T20:36:53Z

starboard/shared/libiamf/iamf_config_reader.h

+  std::optional<uint32_t> mix_presentation_id_;
+
+  bool has_valid_config_ = false;
+  bool binaural_mix_presentation_id_ = -1;


Are "has_valid_config_" and "binaural_mix_presentation_id_" used?

jasonzhangxx · 2024-08-06T20:47:34Z

starboard/shared/libiamf/iamf_config_reader.cc

+
+  int bytes_read = 0;
+  bool error = true;
+  while (bytes_read < 128 && buf[bytes_read] != '\0') {


Copying all chars together should have better performance. We can find '\0' first and then copy the needed chars.

jasonzhangxx · 2024-08-06T20:56:23Z

starboard/shared/libiamf/iamf_config_reader.cc

+
+      uint32_t codec_id = 0;
+      std::memcpy(&codec_id, &buf[buffer_head_], sizeof(uint32_t));
+      // Mp4 is in big-endian


Let's refine the comment here. Shall we always swap the bytes of codec id? And do we need to swap the bytes of any other field?

Talked to Kaido offline, and confirmed that Cobalt only supports little endian. As long as this code won't be upstreamed, it's ok to assume that the platform is little endian. However, please comment and DCHECK() somewhere (say in the ctor).

Added a preprocessor check in the ctor

starboard/shared/libiamf/iamf_audio_decoder.cc

jasonzhangxx · 2024-08-06T21:55:20Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+    if (kEnableSurroundAudio) {
+      SbMediaAudioConfiguration out_config;
+      SbMediaGetAudioConfiguration(0, &out_config);
+      int channels = std::max(out_config.number_of_channels, 2);


Shall we cap the channels to 2 here?

This only executes if we want to force surround sound - it's 2 channels by default.

I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.

But let's consult with the h5 player team to see their preference.

jasonzhangxx · 2024-08-06T22:00:03Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+      SB_DLOG(INFO) << "Defaulting to stereo output.";
+    }
+
+    error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system);


SbMediaGetAudioConfiguration() will return the audio output device capabilities. I don't think we should set the output sound system based on the audio output devices directly. For example, if the input is stereo and we have 5.1 audio output system, we'll set output sound system to 5.1 for stereo input here. Is that expected? And what will happen if we do that?

The IAMF decoder will upmix the stereo signal to 5.1, though that may not be desirable. I think we'll need some way for the player to signal to Cobalt if it wants 5.1 or stereo, without changing the audio stream.

For demo purposes I'll need a separate build that forces an SbPlayer to be created with 6 channels, since we can't tell based on the stream if the player expects 6 channels or not.

Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.

b/364414955

starboard/shared/libiamf/iamf_audio_decoder.cc

starboard/android/shared/media_is_audio_supported.cc

starboard/shared/libiamf/iamf_audio_decoder.h

starboard/shared/libiamf/iamf_audio_decoder.cc

starboard/shared/libiamf/iamf_config_reader.h

xiaomings · 2024-08-07T19:25:00Z

starboard/shared/libiamf/iamf_config_reader.cc

+
+// Decodes an Leb128 value and stores it in |value|. Returns the number of bytes
+// read. Returns -1 on error.
+int ReadLeb128Value(const uint8_t* buf, uint32_t* value) {


We should have unit tests for this, feel free to implement in a pending PR.

xiaomings · 2024-08-07T19:26:51Z

starboard/shared/libiamf/iamf_config_reader.cc

+    case kObuTypeCodecConfig: {
+      sample_rate_ = 0;
+      uint32_t codec_config_id;
+      bytes_read = ReadLeb128Value(&buf[buffer_head_], &codec_config_id);


Just want to double check if there is logic here to ensure that there are enough bytes for ReadLeb128Value() to read, instead of reading out of bounds, as we don't pass size remaining to ReadLeb128Value().

xiaomings · 2024-08-07T19:28:31Z

starboard/shared/libiamf/iamf_config_reader.cc

+    }
+  }
+
+  SB_CHECK(completed_parsing);


This will crash in production, just want to double check if this is intended.

completed_parsing should be set to true in ReadOBU() line 229, after the reader finishes parsing the Descriptor OBUs and begins to read the encoded audio data. Though this variable is no longer needed.

xiaomings · 2024-08-07T19:30:14Z

starboard/shared/libiamf/iamf_config_reader.cc

+  const uint8_t* buf = input_buffer->data();
+  SB_DCHECK(buf);
+
+  bool completed_parsing = false;


This seems to be indicating that the OBU is the last one, consider naming it something like "is_last_obu", etc..

I've removed this variable.

xiaomings · 2024-08-07T19:34:36Z

starboard/shared/libiamf/iamf_config_reader.cc

+    return false;
+  }
+
+  int next_obu_pos = buffer_head_ + obu_size;


It's not obvious that buf as a parameter and buffer_head_ as a member variable are used together for the parsing head here. Consider refactoring, either by passing buf and advancing buf inside ReadOBU() and ReadOBUHeader(), or passing buffer_head as parameter to these functions.

I've passed the buffer position as a parameter.

starboard/shared/libiamf/iamf_config_reader.h

starboard/shared/libiamf/iamf_config_reader.cc

starboard/shared/libiamf/iamf_config_reader.h

osagie98 · 2024-08-29T20:55:19Z

starboard/shared/libiamf/iamf_config_reader.h

+
+  bool ResetAndRead(scoped_refptr<InputBuffer> input_buffer);
+
+  void Reset();


osagie98 · 2024-08-29T20:55:29Z

starboard/shared/libiamf/iamf_config_reader.h

+
+  void Reset();
+
+  bool is_valid() {


starboard/shared/libiamf/iamf_audio_decoder.h

osagie98 · 2024-08-29T22:51:31Z

starboard/shared/libiamf/iamf_config_reader.h

+  uint32_t config_size() { return config_size_; }
+  // TODO: Allow for selection of multiple mix presentation IDs. Currently,
+  // only the first mix presentation parsed is selected.
+  bool has_mix_presentation_id() { return mix_presentation_id_.has_value(); }


starboard/shared/libiamf/iamf_audio_decoder.cc

starboard/android/shared/media_is_audio_supported.cc

starboard/shared/libiamf/iamf_audio_decoder.cc

starboard/shared/libiamf/iamf_config_reader.h

jasonzhangxx · 2024-09-03T20:32:24Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+    std::copy(std::begin(input_buffers), std::end(input_buffers),
+              std::back_inserter(pending_audio_buffers_));
+    consumed_cb_ = consumed_cb;
+    DecodePendingBuffers();


Yeah, let's try to make it more straightforward if that's not necessary for IAMF decoder.

jasonzhangxx · 2024-09-03T20:40:05Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+      SB_DLOG(INFO) << "Defaulting to stereo output.";
+    }
+
+    error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system);


Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.

starboard/shared/libiamf/iamf_audio_decoder.cc

jasonzhangxx · 2024-09-03T20:59:18Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+    if (kEnableSurroundAudio) {
+      SbMediaAudioConfiguration out_config;
+      SbMediaGetAudioConfiguration(0, &out_config);
+      int channels = std::max(out_config.number_of_channels, 2);


I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.

But let's consult with the h5 player team to see their preference.

jasonzhangxx · 2024-09-03T21:19:20Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+
+  SB_DCHECK(samples_decoded <= reader.samples_per_buffer());
+
+  decoded_audio->ShrinkTo(audio_stream_info_.number_of_channels *


It is the same size as at line 164, so it's redundant.

Does samples_decoded always equal to reader.samples_per_buffer()? If so, we can remove the shrinking code and add a SbDCheck() here.

Not always, during a playback of https://www.youtube.com/watch?v=wDy_YFUOfl4 the first buffer contains 648 samples while the rest contain 960

starboard/shared/libiamf/iamf_config_reader.cc

starboard/shared/libiamf/iamf_audio_decoder.cc

jasonzhangxx · 2024-09-13T17:31:43Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+  SB_DCHECK(is_valid());
+
+  if (input_buffer->size() == 0) {
+    SB_LOG(ERROR) << "Empty input buffer written to IamfAudioDecoder";


As the function returns false, we should report an error here.

jasonzhangxx · 2024-09-13T17:33:53Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+  }
+
+  IamfBufferParser::IamfBufferInfo info;
+  IamfBufferParser().ParseInputBuffer(


We can make ParseInputBuffer() static.

jasonzhangxx · 2024-09-13T17:35:27Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+  IamfBufferParser().ParseInputBuffer(
+      input_buffer, &info, kForceBinauralAudio,
+      kForce6ChannelAudio | kForce8ChannelAudio);
+  if (!info.is_valid()) {


ParseInputBuffer() returns a boolean to indicate if the input buffer can be parsed properly. We can use that returned value instead of a is_valid() function.

jasonzhangxx · 2024-09-13T17:41:47Z

starboard/shared/libiamf/iamf_audio_decoder.cc

+    }
+  } else {
+    IAMF_SoundSystem sound_system = SOUND_SYSTEM_A;
+    if (kForce6ChannelAudio) {


Instead of hardcoding 6 or 8 channels when using surround sound, can we set the output to match the channel count of user audio outputs?

jasonzhangxx · 2024-09-13T17:54:05Z

starboard/shared/libiamf/iamf_buffer_parser.cc

+
+  // Decodes an Leb128 value and stores it in |value|. Returns the number of
+  // bytes read, capped to sizeof(uint32_t). Returns the number of bytes read,
+  // or -1 on error.


Are the leb128 values used in iamf header always be 4 bytes long? The parsing algorithm here is different comparing to other leb128 parsing algorithm.

jasonzhangxx · 2024-09-13T18:03:00Z

starboard/shared/libiamf/iamf_buffer_parser.h

+  bool ParseInputBuffer(const scoped_refptr<InputBuffer>& input_buffer,
+                        IamfBufferInfo* info,
+                        const bool prefer_binaural_audio,
+                        const bool prefer_surround_audio);


Can |prefer_binaural_audio| and |prefer_surround_audio| both be true? If not, we should combine them into one boolean.

jasonzhangxx · 2024-09-13T18:05:02Z

starboard/shared/libiamf/iamf_buffer_parser.h

+  // https://aomediacodec.github.io/iamf/v1.0.0-errata.html#paramdefinition
+  bool SkipParamDefinition(BufferReader* reader) const;
+
+  std::unordered_set<uint32_t> binaural_audio_element_ids_;


We can store audio element ids as a local variablesin ParseInputBufferInternal() instead of a instance variable.

jasonzhangxx · 2024-09-13T18:12:18Z

starboard/shared/libiamf/iamf_buffer_parser.cc

+        binaural_audio_element_ids_.insert(audio_element_id);
+      } else if (loudspeaker_layout > IA_CHANNEL_LAYOUT_STEREO &&
+                 loudspeaker_layout < IA_CHANNEL_LAYOUT_COUNT) {
+        surround_audio_element_ids_.insert(audio_element_id);


I suggest to use one set for both binaural and surround audio elements, and store both element id and layout info in the set, like std::unordered_set<std::pari<uint32_t, uint8_t>>.

osagie98 force-pushed the iamf-decoder branch from 68a425d to 5d82e37 Compare July 10, 2024 19:58

osagie98 force-pushed the iamf-decoder branch from a0dbba4 to 29511e5 Compare July 18, 2024 00:07

osagie98 added 5 commits July 18, 2024 15:39

Implement IAMF audio decoder

9d7dae1

Update config parsing

a0cc666

Decode samples to int16

fc99f42

Enable Android IAMF decode support

0c75785

Add build args

01d087d

Also clean up logs and reformat decoder.

osagie98 force-pushed the iamf-decoder branch from 29511e5 to 01d087d Compare July 18, 2024 22:40

osagie98 marked this pull request as ready for review July 18, 2024 22:43

osagie98 requested review from jasonzhangxx, xiaomings, zhongqiliang and borongc July 18, 2024 22:45

osagie98 added 3 commits July 18, 2024 15:53

Remove flac dep

a63fcd9

Merge branch 'main' into iamf-decoder

731b83f

Move location of libiamf binaries

cd4e326

jasonzhangxx reviewed Aug 5, 2024

View reviewed changes

starboard/linux/x64x11/shared/platform_configuration/BUILD.gn Outdated Show resolved Hide resolved

jasonzhangxx reviewed Aug 6, 2024

View reviewed changes

xiaomings reviewed Aug 7, 2024

View reviewed changes

starboard/shared/libiamf/iamf_config_reader.h Outdated Show resolved Hide resolved

xiaomings reviewed Aug 9, 2024

View reviewed changes

starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved

osagie98 added 3 commits September 1, 2024 03:12

Max IamfConfigReader actually servicable

f820fde

Max IamfAudioDecoder actually servicable

75cade9

Minor formatting

3d57408

osagie98 commented Sep 3, 2024

View reviewed changes

osagie98 requested review from xiaomings and jasonzhangxx September 3, 2024 16:44

osagie98 added 2 commits September 3, 2024 12:43

Update DCHECK in mix_presentation_id()

03221af

Merge branch 'main' into iamf-decoder

d456970

jasonzhangxx reviewed Sep 3, 2024

View reviewed changes

IamfConfigReader becomes IamfBufferParser

7678d98

osagie98 requested a review from jasonzhangxx September 5, 2024 00:32

Update handling of surround audio configurations

3efad63

jasonzhangxx reviewed Sep 13, 2024

View reviewed changes


		bool ResetAndRead(scoped_refptr<InputBuffer> input_buffer);

		void Reset();


		SB_DCHECK(samples_decoded <= reader.samples_per_buffer());

		decoded_audio->ShrinkTo(audio_stream_info_.number_of_channels *

Implement IAMF decoder #3709

Are you sure you want to change the base?

Implement IAMF decoder #3709

Conversation

osagie98 commented Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

osagie98 commented Jun 27, 2024 •

edited

Loading